# load the tidyverse library (which includes dplyr and ggplot2)library(tidyverse) # or library(ggplot2) and library(dplyr)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.0 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
# load the gapminder dataset for this lessongapminder <-read.csv("data/gapminder_data.csv")
Ggplot2 is built on the grammar of graphics which builds plots in layers.
Let’s start off with an example:
# use ggplot to initialize a plot of gapminder's gdpPercap (x) and lifeExp (y)ggplot(data = gapminder, mapping =aes(x = gdpPercap, y = lifeExp)) +# add a scatterplot (points) layergeom_point()
The two top-level functions we have used are ggplot() and geom_point().
Notice the use of + to add a layer.
ggplot():
ggplot(data = gapminder, mapping =aes(x = gdpPercap, y = lifeExp))
This function lets R know that we’re creating a new plot, and any of the arguments we give the ggplot function apply to all layers of our plot.
We’ve passed in two arguments to ggplot:
data = gapminder: tells ggplot what data we want to show on our figure
mapping = aes(x = gdpPercap, y = lifeExp): tells ggplot how variables in the data should map to aesthetic properties (e.g., the x and y coordinates).
geom_point() adds a scatterplot using the data and global aesthetics we specified in ggplot().
Challenge 1
Create a scatterplot of GDP per capita (x) versus lifeExp (y) using just the gapminder data from 2007.
Hint: use the pipe to pipe the output of a filter() function into the ggplot() function
Solution to challenge 1
To use just the gapminder data from 2007, we can use filter() to filter to just 2007 and then pipe the results of this filtering into the first argument of ggplot()
In the previous examples and challenge we’ve used the aes function to tell the scatterplot geom about the x and y locations of each point. Another aesthetic property we can modify is the point color.
Modify the code from the previous challenge to color the points by the “continent” column. What trends do you see in the data? Are they what you expected?
Solution to challenge 2
The solution presented below adds color=continent to the call of the aes function. The general trend seems to indicate an increased life expectancy over the years. On continents with stronger economies we find a longer life expectancy.
gapminder |>filter(year ==2007) |>ggplot(mapping =aes(x = gdpPercap, y = lifeExp, color = continent)) +geom_point()
Captions via code chunk options
You can add a caption to a figure in a quarto document by supplying a label and fig-cap quarto chunk option:
gapminder |>filter(year ==2007) |>ggplot(mapping =aes(x = gdpPercap, y = lifeExp, color = continent)) +geom_point()
Figure 1: GDP per capita vs life expectancy
Let’s compile our document to check that a caption appeared for this figure!
Line plots
Let’s try to visualize life expectancy for each country over time, coloring our lines by continent
# use ggplot() to plot lifeExp versus year as a line plot # and try to color the lines by continentggplot(gapminder, aes(x = year, y = lifeExp, color = continent)) +geom_line()
Our plot looks strange… what’s going on in this plot?
We haven’t told ggplot that we want a separate line for each country.
We can do that by adding a group argument inside the aes() function:
# use ggplot() to plot lifeExp versus year as a line plot and group by country# and try to color the lines by continentggplot(gapminder, aes(x = year, y = lifeExp, color = continent, group = country)) +geom_line()
Multiple geom layers
We can visualize both lines and points on the same plot by adding multiple geom_() layers:
# Add a points layer to the line plot aboveggplot(gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) +geom_line() +geom_point()
Supplying local layer aesthetics
In the example above, the aesthetics (from aes()) are applied to both layers.
To apply an aesthetic just to one layer, you can supply a separate aes() function to the layer:
# recreate the plot above but apply the color aesthetic just to the lines layerggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +geom_line(aes(color = continent)) +geom_point()
The order of the layers
Each layer is drawn on top of the previous layer. What happens if we switch the order of the layers?
# Rewrite the code above but with the points layer and line layer in the opposite orderggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +geom_point() +geom_line(aes(color = continent))
Setting aesthetics to a unfiform (non-data) value
To change the aesthetic of all lines/points to a value that is not dictated by the data, you may think that aes(color="blue") should work, but it doesn’t.
Let’s try to set the color of our lines to “blue”:
# create the same line plot as above of year vs lifeExp for each country,# but try to set the color of all of the lines to "blue" inside aes():ggplot(gapminder) +geom_line(aes(x = year, y = lifeExp, group = country, color ="blue"))
When setting an aesthetic to a value that does not correspond to a variable from our data, we need to move the color specification outside of the aes() function:
# fix the above code by moving the `color` argument outside `aes()`ggplot(gapminder) +geom_line(aes(x = year, y = lifeExp, group = country), color ="blue")
Transparency
Another aesthetic value that is helpful is adding transparency using alpha:
# Add transparency (alpha = 0.2) to the previous line plot of year vs life expggplot(gapminder) +geom_line(aes(x = year, y = lifeExp, group = country), color ="blue", alpha =0.2)
Transformations
Recall our scatterplot of gdpPercap vs lifeExp (this time with transparency):
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(alpha =0.5)
Let’s add a scale layer to present the x-axis on a log10 scale:
# add a log-10 scale for the x-axis from the previous plotggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(alpha =0.5) +scale_x_log10()
Adding a linear fit
We can also fit a simple relationship to the data by adding another layer, geom_smooth():
# add a lm smooth layer to the previous plotggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(alpha =0.5) +scale_x_log10() +geom_smooth(method ="lm")
`geom_smooth()` using formula = 'y ~ x'
Try changing the linewidth using the linewidth argument.
Challenge 4a
Modify the following code so that all of the points are colored “orange” and have size equal to 3.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(alpha =0.5, color ="orange", size =3) +geom_smooth(method ="lm") +scale_x_log10()
`geom_smooth()` using formula = 'y ~ x'
Challenge 4b
Modify your solution to Challenge 4a so that the color of the points layer (but not the smooth layer) is instead determined by the continent variable and the size is determined by the pop variable. You should only have one smooth line in your final plot.
Earlier we visualized the change in life expectancy over time across all countries in one plot like this:
ggplot(gapminder, aes(x = year, y = lifeExp, color = continent, group = country)) +geom_line()
Another way to view this data is to create a separate plot for each continent.
One way to do this would be to create a separate plot for each continent manually. For example:
# create a line plot for the countries in the Americas onlygapminder |>filter(continent =="Americas") |>ggplot(aes(x = year, y = lifeExp, group = country)) +geom_line()
# create a line plot for all of countries in Europe onlygapminder |>filter(continent =="Europe") |>ggplot(aes(x = year, y = lifeExp, group = country)) +geom_line()
But there is a more efficient way to do this using facet_wrap():
# create a grid of line plots of year vs lifeExp for the countries in each continent# using facet_wrap()ggplot(gapminder, aes(x = year, y = lifeExp, group = country)) +geom_line() +facet_wrap(~continent)
Challenge 5
Create a faceted set of line plots for life expectancy versus year for each country in the Americas (e.g., each facet will contain the individual line plot for a single country in the Americas).
You can add labels to plots using the labs() function.
gapminder |>filter(country =="Brazil") |>ggplot() +geom_line(aes(x = year, y = lifeExp))
# Add reasonable labels to the plot abovelabs(x ="Year", y ="Life expectancy", title ="Life expectency by year in Brazil")
$x
[1] "Year"
$y
[1] "Life expectancy"
$title
[1] "Life expectency by year in Brazil"
attr(,"class")
[1] "labels"
Challenge 6
Using geom_boxplot(), generate boxplots to compare life expectancy between the different continents (set x = continent and y = lifeExp), faceted by year.
Color each boxplot by continent and rename each label so that it is nicely formatted and human-readable.
Solution to Challenge 6
Here a possible solution:
gapminder |>ggplot(aes(x = continent, y = lifeExp, fill = continent)) +geom_boxplot() +facet_wrap(~year) +labs(x ="Continent",y ="Life Expectancy",fill ="Continent")
Built-in themes
There are several themes for making your plots even prettier. For example,
theme_classic():
# add theme_classic() to the following plotggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(aes(size = pop, color = continent)) +scale_x_log10() +theme_classic()
theme_minimal():
# add theme_minimal() to the following plotggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(aes(size = pop, color = continent)) +scale_x_log10() +theme_minimal()
theme_bw():
# add theme_bw() to the following plotggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +geom_point(aes(size = pop, color = continent)) +scale_x_log10() +theme_bw()